audio synthesis.
Firstly, we present the audio in this work.
Now we demonstrate the controllable sound synthesis. As described in the paper Section 4.3, we specify the target pitch and instrument , and sample the pitch code and timbre code from the conditional distribution and , respectively, where and . In the following demonstration, we specify the same pitches for all instruments, play the audio and display the corresponding Mel-spectrograms.
As described in Section 4.4 in the paper, we first infer and of the source input, and modify (denoted as ) by:
Note that, in practice, we do not need labels of source instrument and pitch for timbre transfer, as the two variables are automatically inferred by and , respectively. infers the mixture component (source instrument identity) to which belongs, and is then obtained by subtracting mean of the mixture component of the target to the that of the source.
Following Fig. 5 in the paper, we demonstrate , , , and .
The source instrument is gradually changed to the target instrument, by .